Programmable and Scalable Architecture for Graphics Processing Units

نویسندگان

  • Carlos S. de La Lama
  • Pekka Jääskeläinen
  • Jarmo Takala
چکیده

Graphics processing is an application area with high level of parallelism at the data level and at the task level. Therefore, graphics processing units (GPU) are often implemented as multiprocessing systems with high performance floating point processing and application specific hardware stages for maximizing the graphics throughput. In this paper we evaluate the suitability of Transport Triggered Architectures (TTA) as a basis for implementing GPUs. TTA improves scalability over the traditional VLIW-style architectures making it interesting for computationally intensive applications. We show that TTA provides high floating point processing performance while allowing more programming freedom than vector processors. Finally, one of the main features of the presented TTA-based GPU design is its fully programmable architecture making it suitable target for general purpose computing on GPU APIs which have become popular

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating the Effects of Hardware Parameters on Power Consumptions in SPMV Algorithms on Graphics Processing Units (GPUs)

Although Sparse matrix-vector multiplication (SPMVs) algorithms are simple, they include important parts of Linear Algebra algorithms in Mathematics and Physics areas. As these algorithms can be run in parallel, Graphics Processing Units (GPUs) has been considered as one of the best candidates to run these algorithms. In the recent years, power consumption has been considered as one of the metr...

متن کامل

A Scalable and Reconfigurable Shared-Memory Graphics Cluster Architecture

If the computational demands of an interactive graphics rendering application cannot be met by a single commodity Graphics Processing Unit (GPU), multiple graphics accelerators may be utilised on multi-GPU based systems such as SLI [1] or Crossfire [2] or by a cluster of PCs in conjunction with a software infrastructure. Typically these PC cluster solutions allow the application programmer to u...

متن کامل

Image and Video Processing on CUDA: State of the Art and Future Directions

In the last few years a myriad of computer graphic applications have been developed using standard programming techniques, which are mainly based on multicore general-purpose processors (CPUs) architectures. Due to the rapid turning towards high definition multimedia, more and more researches have been done that need both computational resources and memory space to achieve high performance. To ...

متن کامل

Efficient Image Processing Using Reaction- Diffusion Cnn Implemented in Cuda Technology

This paper explores an implementation model for speeding-up the execution time for the highly computational model of the reaction-diffusion CNN (RD-CNN) described in [1]. RD-CNNs as well as standard CNNs are computing intensive, and this is a limiting factor to explore its full potential especially for image processing tasks. Hardware implementations using VLSI or FPGA architectures can provide...

متن کامل

Accelerating radio astronomy cross-correlation with graphics processing units

We present a highly parallel implementation of the cross-correlation of timeseries data using graphics processing units (GPUs), which is scalable to hundreds of independent inputs and suitable for the processing of signals from “Large-N” arrays of many radio antennas. The computational part of the algorithm, the X-engine, is implementated efficiently on Nvidia’s Fermi architecture, sustaining u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009